Goto

Collaborating Authors

 gaussian mixture classification


Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Neural Information Processing Systems

We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process. We explore the performance of the algorithm as a function of control parameters shedding light on how it navigates the loss landscape.


Review for NeurIPS paper: Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Neural Information Processing Systems

Additional Feedback: - Two-cluster case is a convex optimization of the linear model and has been investigated in a bit different context [21]. Therefore, the three cluster case is more untrivial and exciting. However, I am not sure that the DMFT formulation in the three-cluster case is tractable enough to analyze SGD dynamics' behavior. Since the three-cluster case is non-convex optimization, I suspect that DMFT equations (20) have some local optima. If this is the case, it becomes unclear how typical the dynamics shown in experiments on three-cluster cases are.


Review for NeurIPS paper: Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Neural Information Processing Systems

The reviewers agree that the techniques leveraged in the paper should be of interest to the wider NeurIPS community. Furthermore, even though the setting analyzed is relatively simple, the analysis is challenging, and understanding the effects of batch size is a problem of broad interest.


Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Neural Information Processing Systems

We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process.